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1 Buildin g effective queries in natural language information retrieval 
Tomek Strzalkowski, Fang Lin, Jose Perez-Carballo, Jin Wang 

March 1997 Proceedings of the fifth conference on Applied natural language processing 

Full text available: ~~ 



Additional Information: citation , abstract , references , dlinqs 



f| pdi{771 03 KB) .w 

In this paper we report on our natural language information retrieval (NLIR) project as related to t 
. concluded 5th Text Retrieval Conference (TREC-5). The main thrust of this project is to use natura 
processing techniques to enhance the effectiveness of full-text document retrieval. One of our goa 
demonstrate that robust if relatively shallow NLP can help to derive a better representation of text 
statistical search. Recently, we have turned our attenti ... 

2 Mining new media: Newsjunkie: providing personalized newsfeeds via analysis of informatics 
Evgeniy Gabrilovich, Susan Dumais, Eric Horvitz 

May 2004 Proceedings of the 13th conference on World Wide Web 



Full text available: ®pdf v 152 : 18 K3) 



Additional Information: full. citation, a b street, references, index te&T.s 



We present a principled methodology for filtering news stories by formal measures of information 
how the techniques can be usedto custom-tailor news feeds based on information that a user has 
We review methods for analyzing novelty and then describe Newsjunkie, a system that personalizt 
by identifying the novelty of stories in the context of stories they have already reviewed. Newsjunl 
novelty-analysis algorithms that represent articl ... 



Keywords: news, novelty detection, personalization 



Extraetlng.us^ 

David M. Hilbert, David F. Redmiles 

December 2000 ACM Computing Surveys (CSUR), Volume 32 Issue 4 

Full text available: ^.pd^lSO.MSi Additional Information: MldMten, siMC&et, .references, citings, index te 

Modern window-based user interface systems generate user interface events as natural products c 
operation. Because such events can be automatically captured and because they indicate user beh 
to an application's user interface, they have long been regarded as a potentially fruitful source of i 
regarding application usage and usability. However, because user interface events are typically vo 
in detail, automated support is generally ... 
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Keywords: human-computer interaction, sequential data analysis, usability testing, user interfaa 
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M. G. Brown, J. T. Foote, G. J. F. Jones, K. Sparck Jones, S. J. Young 

February 1997 Proceedings of the fourth ACM international conference on Multimedia 

Full text available: ^pd;T182 MB) Additional Information: oration , references , citings , index terras 



Keywords: audio indexing, browsing, content-based retrieval, information retrieval, speech recog 
spotting 



Multlmediainfom 

Dulce Ponceleon, Savitha Srinivasan 

October 2001 Proceedings of the tenth international conference on Information and knowlec 
management 

Full text available: l || pdj-117 97 KB) Additional Information: M\ citation , abstract , references, index ierros 

This paper addresses the problem of automatic detection of salient video segments for real-world . 
as corporate training based on associated speech transcriptions. We present a novel segmentation 
on automatic speech recognition (ASR) applied to the audio track of the video. Our feature set cor 
grams extracted from the imperfect speech transcriptions. We use a two-pass algorithm that comt 
based method with a content-based method. In th ... 

BMectjngjgpj^ 

Tanveer Syeda-Mahmood, S. Srinivasan 

October 2000 Proceedings of the eighth ACM international conference on Multimedia 

Full text available: ^ pdf(1.04 MB) Additional Information: full citation , abstract references , sitings , index t*; 

The detection of events is essential to high-level semantic querying of video databases. It is also a 
problem requiring the detection and integration of evidence for an event available in multiple infor 
modalities, such as audio, video and language. This paper focuses on the detection of specific type 
namely, topic of discussion events that occur in classroom/ lecture environments. Specifically, we f 
driven approach to the detection of topic of ... 

Keywords: multi-modal fusion, query-driven topic detection, slide detection, topic of discussion e 
audio events 



EarSy user— s ystem interaction for database selection in massive domain-specific online env 
JackG. Conrad, Joanne R. S. Claussen 

January 2003 ACM Transactions on Information Systems (TOIS), Volume 21 Issue 1 

Full text available: ^.Rdf(845,54.KB) Additional Information: Ml elation, abstract, references, indexjerryrS 

The continued growth of very large data environments such as Westlaw and Dialog, in addition to 
Web, increases the importance of effective and efficient database selection and searching. Current 
largely on completely autonomous and automatic selection, searching, and results merging in disti 
environments. This fully automatic approach has significant deficiencies, including reliance upon th 
which databases with relevant documents are not search ... 

Keywords: Database selection, metadata for retrieval, structuring information to aid search and i 
interaction 
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Temporal summaries of new topics 
James Allan, Rahul Gupta, Vikas Khandelwal 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference on Rest 
development in information retrieval 

Full text available: ^pdf(14e,56.KS) Additional Information: MLsMJon, abstract, references, citings, jndex te 

We discuss technology to help a person monitor changes in news coverage over time. We define t< 
summaries of news stories as extracting a single sentence from each event within a news topic, w 
are presented one at a time and sentences from a story must be ranked before the next story can 
We explain a method for evaluation, and describe an evaluation corpus that we have built. We als< 
methods for constructing temporal summaries and evalu ... 

Keywords: experimental design, metrics, summarization 



Gabriel L. Somlo, Adele E. Howe 

July 2003 Proceedings of the second international joint conference on Autonomous agen 
multiagent systems 

Full text available: ^.pdf[154 s fi4 KB} Additional Information: fuN citation, sbstrsct reM^ces, index teans 

Personalized information agents can help overcome some of the limitations of communal Web info 
such as portals and search engines. Two important components of these agents are: user profiles 
filtering or gathering services. Ideally, these components can be separated so that a single user pi 
leveraged for a variety of information services. Toward that end, we are building an information ac 
SurfAgent) in previous studies, we have develope ... 

Keywords: information agents, query generation, user modeling 



Cross-llnguaS C*ST*RD: English access to Hindi information 

Anton Leuski, Chin-Yew Lin, Liang Zhou, Ulrich Germann, Franz Josef Och, Eduard Hovy 

September 2003 ACM Transactions on Asian Language Information Processing (TALIP), voium 

Full text available: ^.pdf£21.0 J1KB) Additional Information: Ml.cMion, abstract, references, Mex terrns 

We present C*ST*RD, a cross-language information delivery system that supports cross-language 
retrieval, information space visualization and navigation, machine translation, and text summariza 
documents and clusters of documents. C*ST*RD was assembled and trained within 1 month, in th 
DARPA's Surprise Language Exercise, that selected as source a heretofore unstudied language, Hir 
brief time, we could not create deep Hindi capabilities for all th ... 

Keywords: Cross-language information retrieval, Hindi-to-English machine translation, headline c 
information retrieval and information space navigation, single- and multi-document text summariz 



James Allan, Courtney Wade, Alvaro Bolivar 

July 2003 Proceedings of the 26th annual international ACM SIGIR conference on Resear 
development in informaion retrieval 

Full text available: "jg _pd§206 ; 2S KB} Additional Information: fylcitaHon, sbsSM, reference. Index teans 

Previous research in novelty detection has focused on the task of finding novel material, given a si 
documents on a certain topic. This study investigates the more difficult two-part task defined by tl 
novelty track: given a topic and a group of documents relevant to that topic, 1) find the relevant s 
the documents, and 2) find the novel sentences from the collection of relevant sentences. Our resi 
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the former step appears to be the more diffic ... 
Keywords: TREC, novelty, redundancy, relevant sentences 

12 Challenges Jainforrnafc 
jntelilgentjnfor^ 

James Allan, Jay Aslam, Nicholas Beikin, Chris Buckley, Jamie Callan, Bruce Croft, Sue Dumais, Norb 
Harman, David J. Harper, Djoerd Hiemstra, Thomas Hofmann, Eduard Hovy, Wessel Kraaij, John Lafl 
Lavrenko, David Lewis, Liz Liddy, R. Manmatha, Andrew McCallum, Jay Ponte, John Prager, Dragomh 
Resnik, Stephen Robertson, Roni Rosenfeld, Salim Roukos, Mark Sanderson, Rich Schwartz, Amit Sin 
Smeaton, Howard Turtle, Ellen Voorhees, Ralph Weischedel, Jinxi Xu, ChengXiang Zhai 
April 2003 ACM SIGIR Forum volume 37 issue l 

Full text available: ^pdftl.oO.MB) Additional Information: fulj. cIMiQ.n 



13 Scalable feature selection, classification and signature generation for organizing large text d 
hierarchic 

Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan 

August 1998 The VLDB Journal — The International Journal on Very Large Data Bases, voium< 

Full text available: ^i>dfC231v32.K.liis) Additional Information: MLdMion, abstract, citings, index tenr^s 

We explore how to organize large text databases hierarchically by topic to aid better searching, br 
filtering. Many corpora, such as internet directories, digital libraries, and patent databases are mai 
into topic hierarchies, also called taxonomies. Similar to indices for relational data, taxonomies mc 
access more efficient. However, the exponential growth in the volume of on-line textual informatic 
impossible to maintain such taxono ... 

1 4 JilfbrmMon ^filtemg, a 

Nicholas J. Beikin, W. Bruce Croft 

December 1992 Communications of the ACM, volume 35 issue 12 

Full text available: ftpdr(;i58iyiB} Additional Information: MLcsatjon. references, cjtjn&s, iildexierrns, reyi; 



Keywords: information filtering, information retrieval 



15 Search 

unstructured text 
David A. Smith 

July 2002 Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries 

Full text available: ^.pdf{136.J7.KS} Additional Information: ful.1. citation, abstract, rMsrences, .citings, Index te 

Digital libraries of historical documents provide a wealth of information about past events, often in 
form. Once dates and place names are identified and disambiguated, using methods that can diffe 
examine collocations to detect events. Collocations can be ranked by several measures, which var 
according to type of events, but the log-likelihood measure (-2 log &lgr;) offers a reasonable balar 
frequently and infrequently mentioned events and ... 

Keywords: event detection, geographic visualization, phrase browsing 
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W. Hurst, R. Muller 

October 1999 Proceedings of the seventh ACM international conference on Multimedia (Part 

Full text available: ^pd^lM.MSi Additional Information: Mi citation, aDstnsct, reierenoes, citings, jndexts 

In order to improve the acceptance of recorded presentations, we introduce a new open document 
wide range of different media classes typically appearing in this scenario. Instances of this docum* 
replayed using our time-based synchronization model. Random access in combination with the rea 
stream/media-layered synchronization mechanism results in essential features such as Random Vi 
Unrestricted Cross-Referencing ... 

1 7 Marine. jearning.jn. ay tern 
Fabrizio Sebastiani 

March 2002 ACM Computing Surveys (CSUR), volume 34 issue l 

Full text available: ^.pdl{52;l;il KB) Additional Information: Million, sMtrBct, MerencM, ?it'JjS§., ir^ex 

The automated categorization (or classification) of texts into predefined categories has witnessed ; 
interest in the last 10 years, due to the increased availability of documents in digital form and the 
organize them. In the research community the dominant approach to this problem is based on ma 
techniques: a general inductive process automatically builds a classifier by learning, from a set of 
documents, the characteristics of the categories. ... 

Keywords: Machine learning, text categorization, text classification 



1 8 M 9nnation..R J etn waL 

Monika Henzinger, Bay-Wei Chang, Brian Milch, Sergey Brin 

May 2003 Proceedings of the twelfth international conference on World Wide Web 

Full text available: ^_pdf( 126.70 KB} Additional Information: fyjj.cltaUon, abstract, r^erenoea, citings, ind^Jfc 

Many daily activities present information in the form of a stream of text, and often people can ben 
additional information on the topic discussed. TVibroadcast news can be treated as one such strea 
paper we discuss finding news articles on the web that are relevant to news currently being broadi 
a variety of algorithms for this problem, looking at the impact of inverse document frequency, stei 
compounds, history, and query length on the relevance a ... 

Keywords: query-free search, web information retrieval 



Filtering: Improving realism of topic tracking evaluation 
Anton Leuski, James Allan 

August 2002 Proceedings of the 25th annual international ACM SIGIR conference on Resear 
development in information retrieval 

Full text available: ^.j?dS"26:i3£ K3} Additional Information: fyjj.oitaiipn, abstract, references, index. terrns 

Topic tracking and information filtering are models of interactive tasks, but their evaluations are g 
way that does not reflect likely usage. The models either force frequent judgments or disallow any 
the user is always available to make a judgment, and do not allow for user fatigue. In this study v\ 
evaluation framework for topic tracking to incorporate those more realistic issues. We demonstrat< 
can be done in a realistic interactive se ... 

Keywords: filtering, interactive tracking, topic detection and tracking 



Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Softer, Jason Zien 

November 2003 Proceedings of the twelfth international conference on Information and knoi 
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management 

Full text available: ^i>dfC24§.v&(?„KB) Additional Information: Ml citation, abstract, references, indexteirns 

We present an efficient query evaluation method based on a two level approach: at the first level, 
iterates in parallel over query term postings and identifies candidate documents using an approxin 
taking into account only partial information on term occurrences and no query independent factors 
level, promising candidates are fully evaluated and their exact scores are computed. The efficiency 
process can be improved signific ... 

Keywords: WAND, document-at-a-time, efficient query evaluation 
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1 Enowjed^ 
knowledge resource 

Eric Brown, Savitha Srinivasan, Anni Coden, Dulce Ponceleon, James Cooper, Arnon Amir, Jan 
Pieper 

October 2001 Proceedings of the tenth international conference on Information and 
knowledge management 

Full text available: ^.pdfiZ2£MBj Additional Information: fyil.citation, abstract, rgferenc^s, index te-rns 

Speech is a tantalizing mode of human communication. On the one hand, humans 
understand speech with ease and use speech to express complex ideas, information, and 
knowledge. On the other hand, automatic speech recognition with computers is still very 
hard, and extracting knowledge from speech is even harder. In this paper we motivate the 
study of speech as a knowledge resource and briefly survey a family of related applications 
and systems being developed at IBM Research aimed towards the goal o ... 

2 iofomiMion^ 

Monika Henzinger, Bay-Wei Chang, Brian Milch, Sergey Brin 

May 2003 Proceedings of the twelfth international conference on World Wide Web 

Additional Information: MLciMion, s&str&l, referarioe^, citings, inctex 
terms 

Many daily activities present information in the form of a stream of text, and often people 
can benefit from additional information on the topic discussed. TV broadcast news can be 
treated as one such stream of text; in this paper we discuss finding news articles on the web 
that are relevant to news currently being broadcast. We evaluated a variety of algorithms for 
this problem, looking at the impact of inverse document frequency, stemming, compounds, 
history, and query length on the relevance a ... 



Keywords: query-free search, web information retrieval 
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July 1986 Computational Linguistics, volume 12 issue 3 
Full text available: *fl3:xltV2 25 MB: 
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4 Query evaluation techniques for iarge databases 
Goetz Graefe 

June 1993 ACM Computing Surveys (CSUR), Volume 25 issue 2 

Additional Information: fe|jj.c[tation, abstract, references, citings, Index 



Full text available: f|| salt 9 37 MB) 

terms , review 

Database management systems will continue to manage large data volumes. Thus, efficient 
algorithms for accessing and manipulating large sets and sequences will be required to 
provide acceptable performance. The advent of object-oriented and extensible database 
systems will not solve this problem. On the contrary, modern data models exacerbate the 
problem: In order to manipulate large sets of complex objects as efficiently as today's 
database systems manipulate simple records, query-processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible 
database systems, iterators, object-oriented database systems, operator model of 
parallelization, parallel algorithms, relational database systems, set-matching algorithms, 
sort-hash duality 



5 Searchlng.the.W8b. 

August 2001 ACM Transactions on Internet Technology (TOIT), volume l issue l 

Full text available: tBpdff3i9.98 KB) Additional lnformation: Motion, abstract, reference^ citing index 
^ ' terms, reyjeyv 

We offer an overview of current Web search engine design. After introducing a generic 
search engine architecture, we examine each engine component in turn. We cover crawling, 
local Web page storage, indexing, and the use of link analysis for boosting search 
performance. The most common design and implementation techniques for each of these 
components are presented. For this presentation we draw from the literature and from our 
own experimental search engine testbed. Emphasis is on introduci ... 

Keywords: HITS, PageRank, authorities, crawling, indexing, information retrieval, link 
analysis, search engine 



6 Hancock;. AJ^^^ 

Corinna Cortes, Kathleen Fisher, Daryl Pregibon, Anne Rogers, Frederick Smith 

March 2004 ACM Transactions on Programming Languages and Systems (TOP LAS), 

Volume 26 Issue 2 

Full text available: ^pdf(21Z-55.KB) Additional Information: Ml.citatjon, abstract, references, jndex.terrns 

Massive transaction streams present a number of opportunities for data mining techniques. 
The transactions in such streams might represent calls on a telephone network, commercial 
credit card purchases, stock market trades, or HTTP requests to a web server. While 
historically such data have been collected for billing or security purposes, they are now 
being used to discover how the transactors, for example, credit-card numbers or IP 
addresses, use the associated services. Over the past 5 years, w ... 

Keywords: Domain-specific languages, data mining, statistical models 
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David M. Hilbert, David F. Redmiies 

December 2000 ACM Computing Surveys (CSUR), volume 32 issue 4 
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Full text available: ^gdf£iMMB). Additional Information: MLdMon, abstract, reference^, citings, index 

l&Erns, review 

Modern window-based user interface systems generate user interface events as natural 
products of their normal operation. Because such events can be automatically captured and 
because they indicate user behavior with respect to an application's user interface, they 
have long been regarded as a potentially fruitful source of information regarding application 
usage and usability. However, because user interface events are typically voluminos and rich 
in detail, automated support is generally ... 

Keywords: human-computer interaction, sequential data analysis, usability testing, user 
interface event monitoring 
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engines 

Ronny Lempel, Shlomo Moran 

May 2003 Proceedings of the twelfth international conference on World Wide Web 

Full text available: ^pdf(2l2 Additional Information: Mi citation, abstract, references, index terras 

We study the caching of query result pages in Web search engines. Popular search engines 
receive millions of queries per day, and efficient policies for caching query results may 
enable them to lower their response time and reduce their hardware requirements. We 
present PDC (probability driven cache), a novel scheme tailored for caching search results, 
that is based on a probabilistic model of search engine users. We then use a trace of over 
seven million queries submitted to the search engine A ... 

Keywords: caching, query processing and optimization 



9 Streams v .Mryctu^ 




libraries 

Marcos Andre Gongalves, Edward A. Fox, Layne T. Watson, Neill A. Kipp 

April 2004 ACM Transactions on Information Systems (TOIS), volume 22 issue 2 

Full text available: ^pdf(3;16,S5.KBj Additional Information: Mi citation, abstract, references, index .te£T£ 

Digital libraries (DLs) are complex information systems and therefore demand formal 
foundations lest development efforts diverge and interoperability suffers. In this article, we 
propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and 
Societies (5S), which allow us to define digital libraries rigorously and usefully. Streams are 
sequences of arbitrary items used to describe both static and dynamic (e.g., video) content. 
Structures can be viewed as labeled directed gra ... 

Keywords: applications., definitions, foundations, taxonomy 



10 B^oj^^ document sim i 
techn.ojog.ies 

Andreas Paepcke, Hector Garcia-Molina, Gerard Rodriguez-Mula, Junghoo Cho 
March 2000 ACM SIGMOD Record, volume 29 issue 1 

Full text available: ^ p-:jf(i.29 MB) Additional Information: n;H citation , abstract , citings, index terms 

In the face of small, one or two word queries, high volumes of diverse documents on the 
Web are overwhelming search and ranking technologies that are based on document 
similarity measures. The increase of multimedia data within documents sharply exacerbates 
the shortcomings of these approaches. Recently, research prototypes and commercial 
experiments have added techniques that augment similarity-based search and ranking. 
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These techniques rely on judgments about the 'value' of documents. Jud ... 

Keywords: World-Wide Web, collaborative filtering, hypertext, information filters, 
information retrieval, links, metadata, ranking, relevance, search engines 
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September 2001 Proceedings of the 24th annual international ACM SIGIR conference on 
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Management of data 

Full text available: B ^pdf(7;A3,S9„KB) Additional Information: Mixtion, s&strsct, references 

Subsequence similarity matching in time series databases is an important research area for 
many applications. This paper presents a new approximate approach for automatic online 
subsequence similarity matching over massive data streams. With a simultaneous on-line 
segmentation and pruning algorithm over the incoming stream, the resulting piecewise 
linear representation of the data stream features high sensitivity and accuracy. The 
similarity definition is based on a permutation followed by a met ... 
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Craig E. Wills, Mikhail Mikhailov, Hao Shang 

October 2003 Proceedings of the 2003 ACM SIGCOMM conference on Internet 
measurement 

Full text available: ;)df{257.55 KB) Additional Information: fail citation , a&straot , references , index terms 

In this work, we propose a novel methodology that can be used to assess the relative 
popularity for any Internet application based on the data servers it uses. The basic idea is to 
infer popularity of data servers by periodically "poking" at local Domain Name servers 
(LDNSs) that service Domain Name System requests from a set of users running Internet 
applications and determining if LDNSs have cached resource records for the data servers. 
This approach allows us to measure the relative percentag ... 

Keywords: active content measurement, domain name system 
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January 2001 Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries 

Full text available: ^p.df(356,53^KB} Additional Information: Ml citation, abstract, references, index terra 

We have applied speech recognition and text-mining technologies to a set of recorded 
outbound marketing calls and analyzed the results. Since speaker-independent speech 
recognition technology results in a significantly lower recognition rate than that found when 
the recognizer is trained for a particular speaker, we applied a number of post-processing 
algorithms to the output of the recognizer to render it suitable for the Textract text mining 
system. We indexed the call transcri ... 

Keywords: document display, search, speech analysis, speech retrieval, text mining 
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November 1997 Proceedings of the 1997 conference of the Centre for Advanced Studies 
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Full text available: ^.pcjf£4 J.l.MBy Additional Information: MUi^tjon, j&strsct, Marar-ees, indexjexSiS 

Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 

17 An overview of query optimization in relational systems 
Surajit Chaudhuri 

May 1998 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium 
on Principles of database systems 

Full text available: W .pdf£1.,42.MB) Additional Information: Mixtion, references, ejtjngs, jndexjerrcs 
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Full text available: H I sxlff 437.86 KB) Additional Information: fall citation , aftstrsct, references 



Querying XML data is a well-explored topic with powerful database-style query languages 
such as XPath and XQuery set to become W3C standards. An equally compelling paradigm 
for querying XML documents is full-text search on textual content. In this paper, we study 
fundamental challenges that arise when we try to integrate these two querying 
paradigms. While keyword search is based on approximate matching, XPath has exact match 
semantics. We address this mismatch by considering queries on structure ... 

19 DaiS.Mreanis.an 
5tream|ng.t[m 

Like Gao, Zhengrong Yao, X. Sean Wang 

November 2002 Proceedings of the eleventh international conference on Information 

and knowledge management 

p- tit ± . . .&r^A i.-n* Additional Information: M\ citation, abstract, references, cltinas, index 

Full text available: mpc.fi \2cA .S6 KB) ' ' ------- 

■ terms 

For many applications, it is important to quickly locate the nearest neighbor of a given time 
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series. When the given time series is a streaming one, nearest neighbors may need to be 
found continuously at all time positions. Such a standing request is called a continuous 
nearest neighbor query. This paper seeks fast evaluation of continuous queries on large 
databases. The initial strategy is to use the result of one evaluation to restrict the search 
space for the next. A more fundamental i ... 

Keywords: continuous query, nearest neighbor, streaming time series 
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